-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[msan] Handle AVX512 vector down convert (non-mem) intrinsics #147606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This handles llvm.x86.avx512.mask.pmov{,s,us}.*.512 using
handleIntrinsicByApplyingToShadow where possible, otherwise using a
customized slow-path handler, "handleAVX512VectorDownConvert".
Note that shadow propagation of pmov{s,us} (signed/unsigned saturation) are approximated using
truncation. Future work could extend handleAVX512VectorDownConvert to
use GetMinMaxUnsigned() to handle saturation precisely.
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-compiler-rt-sanitizer Author: Thurston Dang (thurstond) ChangesThis handles llvm.x86.avx512.mask.pmov{,s,us}.*.512 using handleIntrinsicByApplyingToShadow where possible, otherwise using a customized slow-path handler, "handleAVX512VectorDownConvert". Note that shadow propagation of pmov{s,us} (signed/unsigned saturation) are approximated using truncation. Future work could extend handleAVX512VectorDownConvert to use GetMinMaxUnsigned() to handle saturation precisely. Patch is 82.63 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/147606.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp b/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
index 36fb7d11b488a..83a419eafc8ec 100644
--- a/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
+++ b/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
@@ -4592,6 +4592,86 @@ struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
ConstantInt::get(IRB.getInt32Ty(), 0));
}
+ // Handle llvm.x86.avx512.mask.pmov{,s,us}.*.512
+ //
+ // e.g., call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512
+ // (<8 x i64>, <16 x i8>, i8)
+ // A WriteThru Mask
+ //
+ // call <16 x i8> @llvm.x86.avx512.mask.pmovs.db.512
+ // (<16 x i32>, <16 x i8>, i16)
+ //
+ // Dst[i] = Mask[i] ? truncate_or_saturate(A[i]) : WriteThru[i]
+ // Dst_shadow[i] = Mask[i] ? truncate(A_shadow[i]) : WriteThru_shadow[i]
+ //
+ // If Dst has more elements than A, the excess elements are zeroed (and the
+ // corresponding shadow is initialized).
+ //
+ // Note: for PMOV (truncation), handleIntrinsicByApplyingToShadow is precise
+ // and is much faster than this handler.
+ void handleAVX512VectorDownConvert(IntrinsicInst &I) {
+ IRBuilder<> IRB(&I);
+
+ assert(I.arg_size() == 3);
+ Value *A = I.getOperand(0);
+ Value *WriteThrough = I.getOperand(1);
+ Value *Mask = I.getOperand(2);
+
+ assert(isa<FixedVectorType>(A->getType()));
+ assert(A->getType()->isIntOrIntVectorTy());
+
+ assert(isa<FixedVectorType>(WriteThrough->getType()));
+ assert(WriteThrough->getType()->isIntOrIntVectorTy());
+
+ unsigned ANumElements =
+ cast<FixedVectorType>(A->getType())->getNumElements();
+ unsigned OutputNumElements =
+ cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
+ assert(ANumElements == OutputNumElements ||
+ ANumElements * 2 == OutputNumElements);
+
+ assert(Mask->getType()->isIntegerTy());
+ assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
+
+ assert(I.getType() == WriteThrough->getType());
+
+ // Widen the mask, if necessary, to have one bit per element of the output
+ // vector.
+ // We want the extra bits to have '1's, so that the CreateSelect will
+ // select the values from AShadow instead of WriteThroughShadow ("maskless"
+ // versions of the intrinsics are sometimes implemented using an all-1's
+ // mask and an undefined value for WriteThroughShadow). We accomplish this
+ // by using bitwise NOT before and after the ZExt.
+ if (ANumElements != OutputNumElements) {
+ Mask = IRB.CreateNot(Mask);
+ Mask = IRB.CreateZExt(Mask, Type::getIntNTy(*MS.C, OutputNumElements), "_ms_widen_mask");
+ Mask = IRB.CreateNot(Mask);
+ }
+ Mask = IRB.CreateBitCast(
+ Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
+
+ Value *AShadow = getShadow(A);
+
+ // The return type might have more elements than the input.
+ // Temporarily shrink the return type's number of elements.
+ VectorType *ShadowType = maybeShrinkVectorShadowType(A, I);
+
+ // PMOV truncates; PMOVS/PMOVUS uses signed/unsigned saturation.
+ // This handler treats them all as truncation.
+ //
+ // TODO: use GetMinMaxUnsigned() to handle saturation precisely.
+ AShadow = IRB.CreateTrunc(AShadow, ShadowType, "_ms_trunc_shadow");
+
+ AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
+
+ Value *WriteThroughShadow = getShadow(WriteThrough);
+
+ Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow);
+
+ setShadow(&I, Shadow);
+ setOriginForNaryOp(I);
+ }
+
// For sh.* compiler intrinsics:
// llvm.x86.avx512fp16.mask.{add/sub/mul/div/max/min}.sh.round
// (<8 x half>, <8 x half>, <8 x half>, i8, i32)
@@ -5412,6 +5492,66 @@ struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
break;
}
+ // AVX512 PMOV: Packed MOV, with truncation
+ // Precisely handled by applying the same intrinsic to the shadow
+ case Intrinsic::x86_avx512_mask_pmov_dw_512:
+ case Intrinsic::x86_avx512_mask_pmov_db_512:
+ case Intrinsic::x86_avx512_mask_pmov_qb_512:
+ case Intrinsic::x86_avx512_mask_pmov_qw_512: {
+ // Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 were removed in
+ // f608dc1f5775ee880e8ea30e2d06ab5a4a935c22
+ handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
+ /*trailingVerbatimArgs=*/1);
+ break;
+ }
+
+ // AVX512 PMVOV{S,US}: Packed MOV, with signed/unsigned saturation
+ // Approximately handled using the corresponding truncation intrinsic
+ // TODO: improve handleAVX512VectorDownConvert to precisely model saturation
+ case Intrinsic::x86_avx512_mask_pmovs_dw_512:
+ case Intrinsic::x86_avx512_mask_pmovus_dw_512: {
+ handleIntrinsicByApplyingToShadow(I,
+ Intrinsic::x86_avx512_mask_pmov_dw_512,
+ /* trailingVerbatimArgs=*/1);
+ break;
+ }
+
+ case Intrinsic::x86_avx512_mask_pmovs_db_512:
+ case Intrinsic::x86_avx512_mask_pmovus_db_512: {
+ handleIntrinsicByApplyingToShadow(I,
+ Intrinsic::x86_avx512_mask_pmov_db_512,
+ /* trailingVerbatimArgs=*/1);
+ break;
+ }
+
+ case Intrinsic::x86_avx512_mask_pmovs_qb_512:
+ case Intrinsic::x86_avx512_mask_pmovus_qb_512: {
+ handleIntrinsicByApplyingToShadow(I,
+ Intrinsic::x86_avx512_mask_pmov_qb_512,
+ /* trailingVerbatimArgs=*/1);
+ break;
+ }
+
+ case Intrinsic::x86_avx512_mask_pmovs_qw_512:
+ case Intrinsic::x86_avx512_mask_pmovus_qw_512: {
+ handleIntrinsicByApplyingToShadow(I,
+ Intrinsic::x86_avx512_mask_pmov_qw_512,
+ /* trailingVerbatimArgs=*/1);
+ break;
+ }
+
+ case Intrinsic::x86_avx512_mask_pmovs_qd_512:
+ case Intrinsic::x86_avx512_mask_pmovus_qd_512:
+ case Intrinsic::x86_avx512_mask_pmovs_wb_512:
+ case Intrinsic::x86_avx512_mask_pmovus_wb_512: {
+ // Since Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 do not exist, we
+ // cannot use handleIntrinsicByApplyingToShadow. Instead, we call the
+ // slow-path handler.
+ handleAVX512VectorDownConvert(I);
+ break;
+ }
+
+ // AVX512 FP16 Arithmetic
case Intrinsic::x86_avx512fp16_mask_add_sh_round:
case Intrinsic::x86_avx512fp16_mask_sub_sh_round:
case Intrinsic::x86_avx512fp16_mask_mul_sh_round:
diff --git a/llvm/test/Instrumentation/MemorySanitizer/X86/avx512-intrinsics.ll b/llvm/test/Instrumentation/MemorySanitizer/X86/avx512-intrinsics.ll
index d9ac1b43924bf..a8d32ce5b2719 100644
--- a/llvm/test/Instrumentation/MemorySanitizer/X86/avx512-intrinsics.ll
+++ b/llvm/test/Instrumentation/MemorySanitizer/X86/avx512-intrinsics.ll
@@ -2,6 +2,47 @@
; RUN: opt %s -S -mattr=+avx512f -passes=msan 2>&1 | FileCheck %s
;
; Forked from llvm/test/CodeGen/X86/avx512-intrinsics.ll
+;
+; Strictly handled:
+; - llvm.x86.avx512.add.ps.512
+; - llvm.x86.avx512.cvtsi2ss32, llvm.x86.avx512.cvttsd2si, llvm.x86.avx512.cvttss2si
+; - llvm.x86.avx512.div.ps.512
+; - llvm.x86.avx512.mask.add.sd.round, llvm.x86.avx512.mask.add.ss.round
+; - llvm.x86.avx512.mask.cmp.pd.512, llvm.x86.avx512.mask.cmp.ps.512, llvm.x86.avx512.mask.cmp.sd, llvm.x86.avx512.mask.cmp.ss
+; - llvm.x86.avx512.mask.compress.v16f32, llvm.x86.avx512.mask.compress.v16i32llvm.x86.avx512.mask.compress.v8f64, llvm.x86.avx512.mask.compress.v8i64
+; - llvm.x86.avx512.mask.cvtpd2dq.512, llvm.x86.avx512.mask.cvtpd2ps.512, llvm.x86.avx512.mask.cvtpd2udq.512, llvm.x86.avx512.mask.cvtps2pd.512, llvm.x86.avx512.mask.cvtps2udq.512
+; - llvm.x86.avx512.mask.cvtsd2ss.round, llvm.x86.avx512.mask.cvtss2sd.round
+; - llvm.x86.avx512.mask.cvttpd2dq.512, llvm.x86.avx512.mask.cvttpd2udq.512, llvm.x86.avx512.mask.cvttps2dq.512, llvm.x86.avx512.mask.cvttps2udq.512
+; - llvm.x86.avx512.mask.expand.v16f32, llvm.x86.avx512.mask.expand.v16i32, llvm.x86.avx512.mask.expand.v8f64, llvm.x86.avx512.mask.expand.v8i64
+; - llvm.x86.avx512.mask.fixupimm.pd.512, llvm.x86.avx512.mask.fixupimm.ps.512, llvm.x86.avx512.mask.fixupimm.sd, llvm.x86.avx512.mask.fixupimm.ss
+; - llvm.x86.avx512.mask.getexp.pd.512, llvm.x86.avx512.mask.getexp.ps.512, llvm.x86.avx512.mask.getexp.sd, llvm.x86.avx512.mask.getexp.ss
+; - llvm.x86.avx512.mask.getmant.pd.512, llvm.x86.avx512.mask.getmant.ps.512, llvm.x86.avx512.mask.getmant.sd, llvm.x86.avx512.mask.getmant.ss
+; - llvm.x86.avx512.mask.max.sd.round, llvm.x86.avx512.mask.max.ss.round
+; - llvm.x86.avx512.mask.pmov.db.mem.512, llvm.x86.avx512.mask.pmov.dw.mem.512, llvm.x86.avx512.mask.pmov.qb.mem.512, llvm.x86.avx512.mask.pmov.qd.mem.512llvm.x86.avx512.mask.pmov.qw.mem.512
+; - llvm.x86.avx512.mask.pmovs.db.mem.512, llvm.x86.avx512.mask.pmovs.dw.mem.512, llvm.x86.avx512.mask.pmovs.qb.mem.512, llvm.x86.avx512.mask.pmovs.qd.mem.512, llvm.x86.avx512.mask.pmovs.qw.mem.512
+; - llvm.x86.avx512.mask.pmovus.db.mem.512, llvm.x86.avx512.mask.pmovus.dw.mem.512, llvm.x86.avx512.mask.pmovus.qb.mem.512, llvm.x86.avx512.mask.pmovus.qd.mem.512, llvm.x86.avx512.mask.pmovus.qw.mem.512
+; - llvm.x86.avx512.mask.rndscale.pd.512, llvm.x86.avx512.mask.rndscale.ps.512, llvm.x86.avx512.mask.rndscale.sd, llvm.x86.avx512.mask.rndscale.ss
+; - llvm.x86.avx512.mask.scalef.pd.512, llvm.x86.avx512.mask.scalef.ps.512
+; - llvm.x86.avx512.mask.sqrt.sd, llvm.x86.avx512.mask.sqrt.ss
+; - llvm.x86.avx512.mask.vcvtps2ph.512
+; - llvm.x86.avx512.maskz.fixupimm.pd.512, llvm.x86.avx512.maskz.fixupimm.ps.512, llvm.x86.avx512.maskz.fixupimm.sd, llvm.x86.avx512.maskz.fixupimm.ss
+; - llvm.x86.avx512.mul.pd.512, llvm.x86.avx512.mul.ps.512
+; - llvm.x86.avx512.permvar.df.512, llvm.x86.avx512.permvar.sf.512
+; - llvm.x86.avx512.pternlog.d.512, llvm.x86.avx512.pternlog.q.512
+; - llvm.x86.avx512.rcp14.pd.512, llvm.x86.avx512.rcp14.ps.512
+; - llvm.x86.avx512.rsqrt14.ps.512
+; - llvm.x86.avx512.sitofp.round.v16f32.v16i32
+; - llvm.x86.avx512.sqrt.pd.512, llvm.x86.avx512.sqrt.ps.512
+; - llvm.x86.avx512.sub.ps.512
+; - llvm.x86.avx512.uitofp.round.v16f32.v16i32
+; - llvm.x86.avx512.vcomi.sd, llvm.x86.avx512.vcomi.ss
+; - llvm.x86.avx512.vcvtsd2si32, llvm.x86.avx512.vcvtss2si32
+; - llvm.x86.avx512.vfmadd.f32, llvm.x86.avx512.vfmadd.f64
+;
+; Heuristically handled:
+; - llvm.fma.f32, llvm.fma.f64
+; - llvm.sqrt.v16f32, llvm.sqrt.v8f64
+; - llvm.x86.avx512.permvar.di.512, llvm.x86.avx512.permvar.si.512
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@@ -6565,43 +6606,24 @@ define <16 x i8>@test_int_x86_avx512_mask_pmov_qb_512(<8 x i64> %x0, <16 x i8> %
; CHECK-NEXT: [[TMP2:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 64) to ptr), align 8
; CHECK-NEXT: [[TMP3:%.*]] = load i8, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 80) to ptr), align 8
; CHECK-NEXT: call void @llvm.donothing()
-; CHECK-NEXT: [[TMP4:%.*]] = bitcast <8 x i64> [[TMP1]] to i512
-; CHECK-NEXT: [[_MSCMP:%.*]] = icmp ne i512 [[TMP4]], 0
-; CHECK-NEXT: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
-; CHECK-NEXT: [[_MSCMP1:%.*]] = icmp ne i128 [[TMP5]], 0
-; CHECK-NEXT: [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP1]]
-; CHECK-NEXT: br i1 [[_MSOR]], label [[TMP6:%.*]], label [[TMP7:%.*]], !prof [[PROF1]]
-; CHECK: 6:
-; CHECK-NEXT: call void @__msan_warning_noreturn() #[[ATTR10]]
-; CHECK-NEXT: unreachable
-; CHECK: 7:
+; CHECK-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64> [[TMP1]], <16 x i8> [[TMP2]], i8 -1)
+; CHECK-NEXT: [[_MSPROP2:%.*]] = or <16 x i8> zeroinitializer, [[TMP4]]
; CHECK-NEXT: [[RES0:%.*]] = call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64> [[X0:%.*]], <16 x i8> [[X1:%.*]], i8 -1)
-; CHECK-NEXT: [[TMP8:%.*]] = bitcast <8 x i64> [[TMP1]] to i512
-; CHECK-NEXT: [[_MSCMP2:%.*]] = icmp ne i512 [[TMP8]], 0
-; CHECK-NEXT: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
-; CHECK-NEXT: [[_MSCMP3:%.*]] = icmp ne i128 [[TMP9]], 0
-; CHECK-NEXT: [[_MSOR4:%.*]] = or i1 [[_MSCMP2]], [[_MSCMP3]]
-; CHECK-NEXT: [[_MSCMP5:%.*]] = icmp ne i8 [[TMP3]], 0
-; CHECK-NEXT: [[_MSOR6:%.*]] = or i1 [[_MSOR4]], [[_MSCMP5]]
-; CHECK-NEXT: br i1 [[_MSOR6]], label [[TMP10:%.*]], label [[TMP11:%.*]], !prof [[PROF1]]
-; CHECK: 10:
-; CHECK-NEXT: call void @__msan_warning_noreturn() #[[ATTR10]]
-; CHECK-NEXT: unreachable
-; CHECK: 11:
-; CHECK-NEXT: [[RES1:%.*]] = call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64> [[X0]], <16 x i8> [[X1]], i8 [[X2:%.*]])
-; CHECK-NEXT: [[TMP12:%.*]] = bitcast <8 x i64> [[TMP1]] to i512
-; CHECK-NEXT: [[_MSCMP7:%.*]] = icmp ne i512 [[TMP12]], 0
-; CHECK-NEXT: [[_MSCMP8:%.*]] = icmp ne i8 [[TMP3]], 0
-; CHECK-NEXT: [[_MSOR9:%.*]] = or i1 [[_MSCMP7]], [[_MSCMP8]]
-; CHECK-NEXT: br i1 [[_MSOR9]], label [[TMP13:%.*]], label [[TMP14:%.*]], !prof [[PROF1]]
-; CHECK: 13:
-; CHECK-NEXT: call void @__msan_warning_noreturn() #[[ATTR10]]
-; CHECK-NEXT: unreachable
-; CHECK: 14:
+; CHECK-NEXT: [[TMP8:%.*]] = call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64> [[TMP1]], <16 x i8> [[TMP2]], i8 [[X2:%.*]])
+; CHECK-NEXT: [[TMP6:%.*]] = zext i8 [[TMP3]] to i128
+; CHECK-NEXT: [[TMP7:%.*]] = bitcast i128 [[TMP6]] to <16 x i8>
+; CHECK-NEXT: [[_MSPROP4:%.*]] = or <16 x i8> [[TMP7]], [[TMP8]]
+; CHECK-NEXT: [[RES1:%.*]] = call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64> [[X0]], <16 x i8> [[X1]], i8 [[X2]])
+; CHECK-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64> [[TMP1]], <16 x i8> zeroinitializer, i8 [[X2]])
+; CHECK-NEXT: [[TMP9:%.*]] = zext i8 [[TMP3]] to i128
+; CHECK-NEXT: [[TMP5:%.*]] = bitcast i128 [[TMP9]] to <16 x i8>
+; CHECK-NEXT: [[_MSPROP:%.*]] = or <16 x i8> [[TMP5]], [[TMP12]]
; CHECK-NEXT: [[RES2:%.*]] = call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64> [[X0]], <16 x i8> zeroinitializer, i8 [[X2]])
+; CHECK-NEXT: [[_MSPROP3:%.*]] = or <16 x i8> [[_MSPROP2]], [[_MSPROP4]]
; CHECK-NEXT: [[RES3:%.*]] = add <16 x i8> [[RES0]], [[RES1]]
+; CHECK-NEXT: [[_MSPROP1:%.*]] = or <16 x i8> [[_MSPROP3]], [[_MSPROP]]
; CHECK-NEXT: [[RES4:%.*]] = add <16 x i8> [[RES3]], [[RES2]]
-; CHECK-NEXT: store <16 x i8> zeroinitializer, ptr @__msan_retval_tls, align 8
+; CHECK-NEXT: store <16 x i8> [[_MSPROP1]], ptr @__msan_retval_tls, align 8
; CHECK-NEXT: ret <16 x i8> [[RES4]]
;
%res0 = call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64> %x0, <16 x i8> %x1, i8 -1)
@@ -6657,43 +6679,24 @@ define <16 x i8>@test_int_x86_avx512_mask_pmovs_qb_512(<8 x i64> %x0, <16 x i8>
; CHECK-NEXT: [[TMP2:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 64) to ptr), align 8
; CHECK-NEXT: [[TMP3:%.*]] = load i8, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 80) to ptr), align 8
; CHECK-NEXT: call void @llvm.donothing()
-; CHECK-NEXT: [[TMP4:%.*]] = bitcast <8 x i64> [[TMP1]] to i512
-; CHECK-NEXT: [[_MSCMP:%.*]] = icmp ne i512 [[TMP4]], 0
-; CHECK-NEXT: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
-; CHECK-NEXT: [[_MSCMP1:%.*]] = icmp ne i128 [[TMP5]], 0
-; CHECK-NEXT: [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP1]]
-; CHECK-NEXT: br i1 [[_MSOR]], label [[TMP6:%.*]], label [[TMP7:%.*]], !prof [[PROF1]]
-; CHECK: 6:
-; CHECK-NEXT: call void @__msan_warning_noreturn() #[[ATTR10]]
-; CHECK-NEXT: unreachable
-; CHECK: 7:
+; CHECK-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64> [[TMP1]], <16 x i8> [[TMP2]], i8 -1)
+; CHECK-NEXT: [[TMP5:%.*]] = or <16 x i8> zeroinitializer, [[TMP4]]
; CHECK-NEXT: [[RES0:%.*]] = call <16 x i8> @llvm.x86.avx512.mask.pmovs.qb.512(<8 x i64> [[X0:%.*]], <16 x i8> [[X1:%.*]], i8 -1)
-; CHECK-NEXT: [[TMP8:%.*]] = bitcast <8 x i64> [[TMP1]] to i512
-; CHECK-NEXT: [[_MSCMP2:%.*]] = icmp ne i512 [[TMP8]], 0
-; CHECK-NEXT: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
-; CHECK-NEXT: [[_MSCMP3:%.*]] = icmp ne i128 [[TMP9]], 0
-; CHECK-NEXT: [[_MSOR4:%.*]] = or i1 [[_MSCMP2]], [[_MSCMP3]]
-; CHECK-NEXT: [[_MSCMP5:%.*]] = icmp ne i8 [[TMP3]], 0
-; CHECK-NEXT: [[_MSOR6:%.*]] = or i1 [[_MSOR4]], [[_MSCMP5]]
-; CHECK-NEXT: br i1 [[_MSOR6]], label [[TMP10:%.*]], label [[TMP11:%.*]], !prof [[PROF1]]
-; CHECK: 10:
-; CHECK-NEXT: call void @__msan_warning_noreturn() #[[ATTR10]]
-; CHECK-NEXT: unreachable
-; CHECK: 11:
-; CHECK-NEXT: [[RES1:%.*]] = call <16 x i8> @llvm.x86.avx512.mask.pmovs.qb.512(<8 x i64> [[X0]], <16 x i8> [[X1]], i8 [[X2:%.*]])
-; CHECK-NEXT: [[TMP12:%.*]] = bitcast <8 x i64> [[TMP1]] to i512
-; CHECK-NEXT: [[_MSCMP7:%.*]] = icmp ne i512 [[TMP12]], 0
-; CHECK-NEXT: [[_MSCMP8:%.*]] = icmp ne i8 [[TMP3]], 0
-; CHECK-NEXT: [[_MSOR9:%.*]] = or i1 [[_MSCMP7]], [[_MSCMP8]]
-; CHECK-NEXT: br i1 [[_MSOR9]], label [[TMP13:%.*]], label [[TMP14:%.*]], !prof [[PROF1]]
-; CHECK: 13:
-; CHECK-NEXT: call void @__msan_warning_noreturn() #[[ATTR10]]
-; CHECK-NEXT: unreachable
-; CHECK: 14:
+; CHECK-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64> [[TMP1]], <16 x i8> [[TMP2]], i8 [[X2:%.*]])
+; CHECK-NEXT: [[TMP6:%.*]] = zext i8 [[TMP3]] to i128
+; CHECK-NEXT: [[TMP7:%.*]] = bitcast i128 [[TMP6]] to <16 x i8>
+; CHECK-NEXT: [[TMP12:%.*]] = or <16 x i8> [[TMP7]], [[TMP11]]
+; CHECK-NEXT: [[RES1:%.*]] = call <16 x i8> @llvm.x86.avx512.mask.pmovs.qb.512(<8 x i64> [[X0]], <16 x i8> [[X1]], i8 [[X2]])
+; CHECK-NEXT: [[TMP8:%.*]] = call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64> [[TMP1]], <16 x i8> zeroinitializer, i8 [[X2]])
+; CHECK-NEXT: [[TMP9:%.*]] = zext i8 [[TMP3]] to i128
+; CHECK-NEXT: [[TMP10:%.*]] = bitcast i128 [[TMP9]] to <16 x i8>
+; CHECK-NEXT: [[TMP19:%.*]] = or <16 x i8> [[TMP10]], [[TMP8]]
; CHECK-NEXT: [[RES2:%.*]] = call <16 x i8> @llvm.x86.avx512.mask.pmovs.qb.512(<8 x i64> [[X0]], <16 x i8> zeroinitializer, i8 [[X2]])
+; CHECK-NEXT: [[_MSPROP:%.*]] = or <16 x i8> [[TMP5]], [[TMP12]]
; CHECK-NEXT: [[RES3:%.*]] = add <16 x i8> [[RES0]], [[RES1]]
+; CHECK-NEXT: [[_MSPROP1:%.*]] = or <16 x i8> [[_MSPROP]], [[TMP19]]
; CHECK-NEXT: [[RES4:%.*]] = add <16 x i8> [[RES3]], [[RES2]]
-; CHECK-NEXT: store <16 x i8> zeroinitializer, ptr @__msan_retval_tls, align 8
+; CHECK-NEXT: store <16 x i8> [[_MSPROP1]], ptr @__msan_retval_tls, align 8
; CHECK-NEXT: ret <16 x i8> [[RES4]]
;
%res0 = call <16 x i8> @llvm.x86.avx512.mask.pmovs.qb.512(<8 x i64> %x0, <16 x i8> %x1, i8 -1)
@@ -6749,43 +6752,24 @@ define <16 x i8>@test_int_x86_avx512_mask_pmovus_qb_512(<8 x i64> %x0, <16 x i8>
; CHECK-NEXT: [[TMP2:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 64) to ptr), align 8
; CHECK-NEXT: [[TMP3:%.*]] = load i8, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 80) to ptr), align 8
; CHECK-NEXT: call void @llvm.donothing()
-; CHECK-NEXT: [[TMP4:%.*]] = bitcast <8 x i64> [[TMP1]] to i512
-; CHECK-NEXT: [[_MSCMP:%.*]] = icmp ne i512 [[TMP4]], 0
-; CHECK-NEXT: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
-; CHECK-NEXT: [[_MSCMP1:%.*]] = icmp ne i128 [[TMP5]], 0
-; CHECK-NEXT: [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP1]]
-; CHECK-NEXT: br i1 [[_MSOR]], label [[TMP6:%.*]], label [[TMP7:%.*]], !prof [[PROF1]]
-; CHECK: 6:
-; CHECK-NEXT: call void @__msan_warning_noreturn() #[[ATTR10]]
-; CHECK-NEXT: unreachable
-; CHECK: 7:
+; CHECK-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64> [[TMP1]], <16 x i8> [[TMP2]], i8 -1)
+; CHECK-NEXT: [[TMP5:%.*]] = or <16 x i8> zeroinitializer, [[TMP4]]
; CHECK-NEXT: [[RES0:%.*]] = call <16 x i8> @llvm.x86.avx512.mask.pmovus.qb.512(<8 x i64> [[X0:%.*]], <16 x i8> [[X1:%.*]], i8 -1)
-; CHECK-NEXT: [[...
[truncated]
|
| // PMOV truncates; PMOVS/PMOVUS uses signed/unsigned saturation. | ||
| // This handler treats them all as truncation. | ||
| // | ||
| // TODO: use GetMinMaxUnsigned() to handle saturation precisely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add whether the imprecision leads to false positives or negatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 29e474f
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
| Value *WriteThrough = I.getOperand(1); | ||
| Value *Mask = I.getOperand(2); | ||
|
|
||
| assert(isa<FixedVectorType>(A->getType())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessarily in this PR, but maybe we want this because this is asserted all over the place?
[[maybe_unused]] static bool isFixedIntVectorTy(const Type* T) {
return isa<FixedVectorType>(T) && T->isIntOrIntVectorTy();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
| // fully defined, but the truncated byte is ????????. | ||
| // | ||
| // TODO: use GetMinMaxUnsigned() to handle saturation precisely. | ||
| AShadow = IRB.CreateTrunc(AShadow, ShadowType, "_ms_trunc_shadow"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there so much whitespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduced, but leaving enough to honor our SpaceX colleagues
| assert(I.arg_size() == 3); | ||
| Value *A = I.getOperand(0); | ||
| Value *WriteThrough = I.getOperand(1); | ||
| Value *Mask = I.getOperand(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to shadow check the mask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Inspired by a suggestion from Florian Google in llvm#147606 (comment)
Inspired by a suggestion from Florian Google in #147606 (comment)
…789) Inspired by a suggestion from Florian Google in llvm/llvm-project#147606 (comment)
This handles
llvm.x86.avx512.mask.pmov{,s,us}.*.512usinghandleIntrinsicByApplyingToShadow()where possible, otherwise using a customized slow-path handler,handleAVX512VectorDownConvert().Note that shadow propagation of
pmov{s,us}(signed/unsigned saturation) are approximated using truncation. Future work could extendhandleAVX512VectorDownConvert()to useGetMinMaxUnsigned()to handle saturation precisely.